of asking the same question is: Is being a member of a particular row associated with being a
member of a particular column?
In this chapter, we describe two tests you can use to answer this question: the Pearson chi-square test,
and the Fisher Exact test. We also explain how to estimate power and sample sizes for the chi-square
and Fisher Exact tests.
Like with other statistical tests, you can run all the tests in this chapter from individual-level
data in a database, where there is one record per participant. But the tests in this chapter can also
be executed using data that has already been summarized in the form of a cross-tab:
Most statistical software is set up to work with individual-level data. In that case, your data file
needs to have two columns for the association you want to test: one containing the categorical
variable representing the treatment group (or whatever category is on the y-axis), and one
containing the categorical variable representing the outcome. If you have the correct columns, all
you have to do is tell the statistical software you are using which test or tests you want to run, and
which variables to use in the test.
Most statistical software is also set up so that you can do these tests using summarized data (rather
than individual-level data), so long as you set an option in your programming when running the
tests. In contrast, online calculators that execute these tests expect you to have already cross-
tabulated the data. These calculators usually present a screen showing an empty table, and you
enter the counts into the table’s cells to run the calculation.
Examining Two Variables with the Pearson Chi-
Square Test
The most commonly used statistical test of association between two categorical variables is called the
chi-square test of association developed by Karl Pearson around the year 1900. It’s called the chi-
square test because it involves calculating a number called a test statistic that fluctuates in accordance
with the chi-square distribution. Many other statistical tests also use the chi-square distribution, but the
test of association is by far the most popular. In this book, whenever we refer to a chi-square test
without specifying which one, we are referring to the Pearson chi-square test of association between
two categorical variables. (Please note that some books use the notation X2 or x2 instead of saying the
term chi-square.)
Understanding how the chi-square test works
You don’t have to understand the equations behind the chi-square test if you have a computer to do
them, which is optimal, though it is possible to calculate the test manually. This means you technically
don’t have to read this section. But we encourage you to do so anyway, because we think you’ll have a
better appreciation for the strengths and limitations of the test if you know its mathematical
underpinnings. Here, we walk you through conducting a chi-square test manually (which is possible to
do in Microsoft Excel).